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Linear Scaling Solution of the Time-Dependent Self-Consistent-Field Equations 

Matt Challacomb«EI 

Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 

A new approach to solving the Time-Dependent Self-Consistent-Field equations is 
developed based on the double quotient formulation of Tsiper [J. Phys. B, 34 L401 
(2001)]. Dual channel, quasi-independent non-linear optimization of these quotients is 
found to yield convergence rates approaching those of the best case (single channel) 
Tamm-Dancoff approximation. This formulation is variational with respect to matrix 
truncation, admitting linear scaling solution of the matrix-eigenvalue problem, which 
is demonstrated for bulk excitons in the polyphenylene vinylene oligomer and the (4,3) 
carbon nanotube segment. 
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The Time-Dependent Self-Consistent-Field equations 
together with models that include some portion of the 
Hartree-Fock (HF) exchange admit control over the range 
of self-interaction in the optical response [H, S [HI 5 an d 
are related to new models of electron correlation based 
on the Random Phase Approximation (RPA) [H 0, 0, 
18fl . Solving the TD-SCF equations is challenging due 
to an unconventional J-symmetric structure of the naive 
molecular orbital (MO) representation, 



(1) 



where A and B are Hermitian blocks corresponding to 
4 th order tensors spanning transitions between occupied 
and virtual sub-spaces, uj is the real excitation energy 
and v = f^>) is the corresponding transition density. By 
construction, the MO representation allows strict sepa- 
ration between the dyadic particle-hole (ph) and hole- 
particle (hp) solutions, X and Y, for which specialized 
algorithms exist. Nevertheless, convergence of the naive 
J-symmetric problem is typically much slower than the 
corresponding Hermitian Tamm-Dancoff approximation 
(TDA), AX = ujX, which is of reduced dimensionality in 
the MO representation. 

Several TD-SCF eigensolvers are based on the oscil- 
lator picture ^ (?) = u> (?) , with K = A + B and 

T = A — B the Hermitian potential and kinetic matrices, 
and the dual {p, q } = corresponding 
to position and momentum. This picture avoids the im- 
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whilst admitting conventional solu- 
tions based on the Hermitian matrix G = K • T, as shown 
by Tamara and Udagawa [l7| and extended by Narita 
and Shibuya with second order optimization of the quo- 
tient uj 2 [p, q\ = q ■ G -p/ \p ■ q\ More recently, Tsiper 
considered the quotients 



w [p,q\ 



p-K-p q ■ T ■ q 



2 \p- q] 2 \p- q\ 
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and developed a corresponding dual channel Lanczos 
solver. Subspace solvers in this dual representation have 
recently been surveyed by Tretiak, Isborne, Niklasson 
and Challacombe (TINC) [li^ . with comparative results 
for semi-empirical models. 

Another challenge is dimensionality and scaling. Writ- 
ing Eq. ([I]) in the general form L • v = lov, admitting 
arbitrary representation, the superoperator matrix L is 
a ~ N 2 x N 2 tetradic, with N the number of basis func- 
tions, assumed proportional to system size. In prac- 
tice the action of L onto v is carried out implicitly as 
L[v] = [F, v] + [G[v], P] , using an existing framework 
for construction of the effective Hamiltonian (Fockian) 
F, where P is the one-particle reduced density matrix, 
G is a screening operator involving Coulomb, exchange 
and/or exchange-correlation terms and the correspon- 
dence between superoperator and functional notation is 
given by a tensorial mapping between diadic and matrix, 

V N 2 X1 <& v„ xJV . 

Recent efforts have focused on addressing the prob- 
lem of dimensionality by employing linear scaling meth- 
ods that reduce the cost of £,[■] within Density Func- 
tional Theory (DFT) to 0(N). However, this remains 
an open problem for the Hartree-Fock (HF) exchange, 
an ingredient in models that account for charge transfer 
in the dynamic and static response, including the Ran- 
dom Phase Approximation (RPA) at the pure HF level 
of theory. Likewise, scaling of the TD-SCF eigenprob- 
lem remains formidable due to associated costs of lin- 
ear algebra, even when using powerful Krylov subspace 
methods. Underscoring this challenge, one of the most 
successful approaches to linear scaling TD-DFT avoids 
the matrix eigenproblem entirely through explicit time- 
evolution [Hilll- 

Linear scaling matrix methods exploit quantum local- 
ity, manifest in approximate exponential decay of matrix 
elements expressed in a well posed, local basis; with the 
dropping of small elements below a threshold, r mtx , this 
decay leads to sparse matrices and 0(N) complexity at 
the forfeit of full precision 0, 0, EH- Likewise, linear 
scaling methods for computing the HF exchange employ 
an advanced form of direct SCF, exploiting this decay 



in the rigorous screening of small exchange interactions 
bellow the two-electron integral threshold T2 e 13j . The 



consequence of these linear scaling approximations is an 
inexact linear algebra that challenges Krylov solvers due 
to nested error accumulation, a subject of recent formal 
interest [Til [Tl|. Consistent with this view, TINC found 
that matrix perturbation (a truncation proxy) disrupts 
convergence of Krylov solvers with slow convergence, i.e. 
Lanczos and Arnoldi for the RPA, but has less impact 
on solvers with rapid convergence, i.e. generically for 
the TDA or Davidson for the RPA. Relative to semi- 
empirical Hamiltonians, the impact of incompleteness on 
subspace iteration may be amplified with first principles 
models and large basis sets (ill-conditioning). 

An alternative is Rayleigh Quotient Iteration (RQI), 
which poses the eigenproblem as non-linear optimiza- 
tion and is variational with respect to matrix pertur- 
bation. Narita and Shibuya [lOfl considered optimiza- 
tion of the quotient lo 2 [p, q\ with second order methods, 
but these are beyond the capabilities of current linear 
scaling technologies and also, convergence is disadvan- 
taged by a power of For semi-empirical Hamiltoni- 
ans, TINC found that optimization of the Thouless func- 
tional lo[v\ = v • L • v/ \v- v\ , corresponding to the so- 
lution of Eq. JT}, was significantly slower for the RPA 
relative to the TDA, and also compared to subspace 
solvers. For first principles models and non-trivial basis 
sets, this naive RQI can become pathologically slow as 
shown in Fig. [T] On the other hand, the Tsiper formula- 
tion exposes the underlying pseudo-Hermitian structure 
of the TD-SCF equations. Here, this structure is ex- 
ploited with QUasi-Independent Rayleigh Quotient Iter- 
ation (QUIRQI), involving dual channel optimization of 
the Tsiper quotients coupled only weakly through line 
search. 

Our development begins with a brief review of the rep- 
resentation independent formulation developed by TINC, 
which avoids the 0(N 3 ) cost of rotating into an ex- 
plicit p-h, h-p symmetry. Instead, this symmetry is 
maintained implicitly via annihilation, x f a (x) = P 
■x-Q + QxP, with P the first order reduced den- 
sity matrix and Q — I — P its compliment. Likewise, 
the indefinite metric associated with the J-symmetry of 
Eq. (fT} is carried through the generalized norm (x, y) = 
tr{a; T - [y, P]}. Introducing the operator equivalents, 
L\p] K.p and L[q] <^ T.q , the Tsiper functional 
becomes uj [p, orl = tttt^tt + l?/" L ^i • Transformations 

Lf ) MJ ^ ^ 2|(p,q)| _ 2|(p,q)| 

between the transition density and the dual space in- 
volves simple manipulations and minimal cost, allowing 
Fock builds with the transition density and optimization 
in the dual space. The splitting operation is given by 
p = /+(«) = P v Q + [Q v ■ P] T and q = f- (v) = 
P v Q [Q v P] T . Likewise, L[p] = /_ (L[v\) and 
L[q] = /+ (L[v]). The back transformation (merge) from 
dual to density is v = F(p, q) = (p + q + [p — q] T ) /2. 
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Figure 1: Convergence of RHF/3-21G TDA and RPA with 
the RQI and QUIRQI algorithms for linear decaene (C10H2). 
Calculations were started from the same random guess, and 
tight numerical thresholds were used throughout. In the rep- 
resentation independent scheme, the cost per iteration is the 
same for TDA and RPA. 



This framework provides the freedom to work in any or- 
thogonal representation, and to switch between transi- 
tion density and oscillator duals with minimal cost. 

QUIRQI is given in Algorithm Q] It begins with a 
guess for the transition density, which is then split into 
its dual (lines 2-3). The choice of initial guess is discussed 
later. Lines 4-24 consist of the non-linear conjugate gra- 
dient optimization of the nearly independent channels: 
In each step, the flow of information proceeds from opti- 
mization of the duals to builds involving the density and 
back to the duals in a merge-annihilate-truncate-build- 
split-truncate (MATBST) sequence. For the variables v, 
p and q this sequence is comprised by lines 22-23 and 5-7, 
and lines 15-19 for the corresponding conjugate gradients 
h v , hp and h q . Truncation is carried out with the filter 
operation as described in Ref. (ll) . with cost and error 
determined by the matrix threshold r m t x - 

The Tsiper functional is the sum of dual quotients to p 
and uj q , determined at line 8, followed by the gradients g p 
and g q computed at line 9. After the first cycle, the cor- 
responding relative error e ro ] (10) and maximum matrix 
element of the gradient g max (11) are computed and used 
as an exit criterion at line 4, along with non-variational 
behavior uj > uj° ld . 

Next, the Polak-Ribiere variant of non-linear conjugate 
gradients yields the search direction in each channel, h p 
and h q (12-14). The action of L[] on to h p and h q is 
then computed, again with a MATBST sequence (15- 
19), followed by a self-consistent dual channel line search 
at line 20, as described below. With steps A p and X q in 
hand, minimizing updates are taken along each conjugate 
direction (22), and the cycle repeats with the MATBST 
sequence spanning lines 21-23 and 5-7. 

Optimization of the Tsipper functional aj[A p ,A g ] = 
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Algorithm 1 QUIRQI 
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procedure QUIRQI(cj, v) 
guess v 

P = f+ (v), q = f- (v) 

while e rc i > e and g max > 7 and uj < Lu° ld do 
L[v] = [F, v] + [G[v],P] 
L\p]=f-(L[v]),L[q]=f + (L[v]) 
filter (L[p\, L[q], r mtx ) 

<q,X.[q]> 



2|(p,q)| ' U -Up+UJ q 



_ (P,L[P]) , 

- 2\(p,q)\ ' U 1 

9p = Qu q - L[p], g q = pLo p - L[q] 
e rcl = (o; old - w) /w 



ffmax = max 



9^ d ,< d > 



a — (9p-9p~9p d ) a _ {9 q ,9 q ~9° q id ) 
PP ~ <S? ld ,S? ld > ' P? ~ /"° ld "° ld ^ 



w ^^,g p ^g p ,g q ^g q 

hp <- g p + Pphp, h q ^g q + /3 q h q 

h v — F(h p , h q ), h v <— f a (h v ) 

filter (hp, h q , h v , r mtx ) 

L[h v ] = [F, h v ] + [G[h v ], P] 

L[h p ] = /_ [L[h v ]), L[h q ] = f+ (L[h v ]) 

filter (L[h p ], L[h q ], r mtx ) 

{X p , \ q } = argmin oj [p + X p h p , q + \ q h q ] 

{\ p ,\ q y 

P <r- p+ Xphp, q «- q + X q h q 

v <- F(p, q), v <- f a (v) 
filter (p,q,v, r mtx ) 
end while 
end procedure 



uj \p + Xph p , q + Xqhq] involves a two dimensional line- 
search (line 20) corresponding to minimization of 

Ap + XpBp + XpCp + A q + XqBq + X^C q 



XpSpq 



XqTpq -\- XpXqUpq 



Oj[X p ,Xq] 

(3) 

with coupling entering through terms in the denomina- 
tor such as U P q = (h p ,h q ). A minimum in Eq. ([3]) can 
be found quickly to high precision by alternately substi- 
tuting one-dimensional solutions one into the other until 
self-consistency is reached. This semi-analytic approach 
starts with a rough guess at the pair {X p , X q } (eg. found 
by a coarse scan) followed by iterative substitution, where 
for example the p-channel update is 



<-{[(2C, 

\_BpRpq 



pRpq 



1CpX q Sp q ) 4 (CpTpq + CpXqUpq) 



BpX q Sp q 



\Aq + Ap 



B q Xq 



ax) T. 



— (AqXq + AqXq + B q X q + C q X q ) U pq ] 



1 n q) 
1/2 



l"l 



(4) 



*2CpRpq 



2Q 



• XqSpq J / [2Cp (Tpq + XqUpq)] 



with an analogous update for the g-channel obtained by 
swapping subscripts. As the solution decouples (S pq , T pq 
and Upq become small) the steps are found independently. 

QUIRQI has been implemented in FreeON 0], which 
employs the linear scaling Coulomb and Hartree-Fock ex- 
change kernels QCTC and ONX with cost and accuracy 



controlled by the two-electron screening threshold T2 C 
[l3| . A-scaling solution of the QUIRQI matrix equations 
is achieved with the sparse approximate matrix-matrix 
multiply (SpAMM), with cost and accuracy determined 
by the drop tolerance T mtx [HHL U|- All calculations were 
carried out with version 4.3 of the gcc/gfortran compiler 
under version 8.04 of the Ubuntu Linux distribution and 
run on a fully loaded 2GHz AMD Quad Opteron 8350. 

For systems studied to date, QUIRQI is found to con- 
verge monotonically with rates comparable to the TDA 
as shown in Fig. [1] Based on the comparative perfor- 
mance presented by TINC, the TDA rate of convergence 
appears to be a lower bound for RPA solvers. In addition 
to the convergence rate, performance is strongly deter- 
mined by the initial guess. The following results have 
been obtained using the polarization response density 
along the polymer axis [19(, which can be computed in 
O(N) by Perturbed Projection [2(j. Also, a relative pre- 
cision of 4 digits in the excitation energy is targeted with 
the convergence parameters e = 10~ 4 and 7 = 10 -3 , with 
exit from the optimization loop on violation of monotonic 
convergence (u> > ui old due to precision limitations asso- 
ciated with linear scaling approximations). 

In Fig. [2l linear scaling and convergence to the bulk 
limit are demonstrated for a series of polyphenylene 
vinylene (PPV) oligomers at the RHF/6-31G** level 
of theory for the threshold combinations {T mtx ,T2 } = 
{10- 4 ,10~ 5 } and {10" 5 ,10^ 6 }. Significantly more con- 
servative thresholds have been used for the Coulomb 
sums, which incur only minor cost. Convergence 
is reached in 24 — 25 iterations, with the cost of 
Coulomb summation via QCTC comparable to the cost of 
SpAMM (r mtx = 10~ 4 ). In Fig. El linear scaling and con- 
vergence to the bulk limit are demonstrated for a series 
of (4,3) carbon nanotube segments at the RHF/3-21G 
level of theory for the same threshold combinations, again 
with convergence achieved in about 24-25 cycles. In both 
cases, tightening the pair {T m t x ,T2 e } leads to a system- 
atically improved result. While the {10 -4 , 10~ 5 } thresh- 
olds that work well for PPV lead to a non-monotone be- 
havior with respect to extent for the nanotube series, 
dropping one more decade to {l0 -5 ,10 -6 } leads to a 
sharply improved behavior. Dropping thresholds further 
to {l0~ 6 , 10 -7 } yields identical results to within the con- 
vergence criteria (~ four digits) across the series, also 
scaling with N but at roughly twice the cost. 

These results demonstrate that QUIRQI can achieve 
both systematic error control and linear scaling in solu- 
tion of the RPA eigenproblem for systems with extended 
conjugation. Relative to PPV, the greater numerical sen- 
sitivity encountered with the nanotube series is consistent 
with the ground state problem, where a smaller band gap 
and greater atomic connectivity typically demand tighter 
thresholds. 

QUIRQI exploits decoupling of the Tsipper functional 
into nearly independent, pseudo-Hermitian quotients 



4 



leading to aggressive convergence rates equivalent to the 
fully Hermitian TDA, while remaining variational with 
respect to matrix truncation (r mtx ). However, QUIRQI 
is not variational with respect to the screening parame- 
ter T2e- It can be systematically improved by tightening 
T20 though, due to rigorous error bounds based on the 
Schwartz inequality [13| . These properties present op- 
portunities for more precise error control as suggested 
by Rubensson, Rudberg and Salek [12|. Further, these 



properties are expected to hold even for the most general 
SCF models, with the only difference being an increas- 
ingly localized transition density matrix and larger cost 
prefactor with an increasing DFT component. Finally, 
a variational approach allows considerable flexibility in 
the path to solution, as errors due to approximation can 
be overcome by optimization, offering opportunities for 
single precision GPU acceleration, variable thresholding, 
incremental Fock builds as well as extrapolation tech- 
niques. 




Figure 3: Approach to the bulk limit of the first excited state 
of the (4,3) carbon nanotube segment at the 3-21G/RPA level 
of theory, with inset showing linear scaling cost for HF ex- 
change (ONX), sparse linear algebra (SpAMM) and Coulomb 
sums (QCTC). 




Figure 2: Approach to the bulk limit of the PPV first excited 
state at the 6-31G**/RPA level of theory, with inset showing 
linear scaling cost for HF exchange (ONX) and sparse lin- 
ear algebra (SpAMM). The cost of Coulomb sums with much 
tighter thresholds are comparable to those for the SpAMM. 
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